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Abstract 

We propose a new class of interacting Markov chain Monte Carlo 
(MCMC) algorithms designed for increasing the efficiency of a modified 
multiple-try Metropolis (MTM) algorithm. The extension with respect 
to the existing MCMC literature is twofold. The sampler proposed ex- 
tends the basic MTM algorithm by allowing different proposal distri- 
butions in the multiple-try generation step. We exploit the structure of 
the MTM algorithm with different proposal distributions to naturally 
introduce an interacting MTM mechanism (IMTM) that expands the 
class of population Monte Carlo methods. We show the validity of the 
algorithm and discuss the choice of the selection weights and of the 
different proposals. We provide numerical studies which show that the 
new algorithm can perform better than the basic MTM algorithm and 
that the interaction mechanism allows the IMTM to efficiently explore 
the state space. 



1 Introduction 



Markov chain Monte Carlo (MCMC) algorithms are now essential for the 
analysis of complex statistical models. In the MCMC universe, one of the 
most widely used class of algorithms is define d by the Metropolis-Hastings 
(MH) ( Metropolis et al. . 19531 : Hastings . 197d ) and its variants. An impor- 
tant generalization of the standard MH formulation is represented by the 
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multiple-try Metropolis (MTM) (|Liu et all boool ) . While in the MH formu 



lation one accepts or rejects a single proposed move, the MTM is designed 
so that the next state of the chain is selected among multiple proposals. The 
multiple-proposal setup can be used effectively to explore the sample space of 
the target distribution and subsequent developmen ts made use of this added 
flexibility. For instance. ICraiu and Lemieux propose to use antithetic 

and quasi-Monte Carlo samples t o generate the proposal s and to improve the 
efficiency of the algorithm while iPandolfi et al.l (|2010bl ) and iPandolfi et al.l 



apply the multiple-proposal idea to a trans-dimensional setup and 
combine Reversible Jump MCMC with MTM. 



This work further generalizes the MTM algorithm presented in lLiu et al. 



(I2OO0I ) in two directions. First, we show that the original MTM transition 
kernel can be modified to allow for different proposal distributions in the 
multiple-try generation step while preserving the ergodicity of the chain. 
The use of different proposal distributions gives more freedom in designing 
MTM algorithms for target distributions that require different proposals 
across the sample space. An important challenge remains the choice of the 
distributions used to generate the proposals and we propose to address it 
by expanding upon methods used within the population Monte Carlo class 
of algorithms. 



The class of population Monte Ca r lo pr o cedures (ICappe et al.l . 12004 : 



Del Moral and Miclol . bood : IPel Morai l2004l : I.Tasra et al.l . l2007l ) has been 



designed to address the inefficiency of classical MCMC samplers in complex 
applications invo l ving i nultimodal and hig h dimensional target distributions 
dPritchard et all . I2OOOI : iHeard et al l. I2OO6I ) . Its formulation relies on a num- 
ber of MCMC processes that are run in parallel while learning from one 
another about the geography of the target distribution. 

A second contribution of the paper is finding reliable generic methods for 
constructing the proposal distributions for the MTM algorithm. We propose 
an interacting MCMC sampling design for the MTM that preserves the 
Markovian property. More specifically, in the proposed interacting MTM 
(IMTM) algorithm, we allow the distinct proposal distributions to use in- 
formation produced by a population of auxiliary chains. We infer that the 
resulting performance of the MTM is tightly connected to the performance 
of the chains' population. In order to maximize the latter, we propose a 
number of strategies that can be used to tune the auxiliary chains. We 
also adapt previous extensions of the MTM and link the use o f stoch astic 
overrelaxation, random-ray Monte Carlo method (see Liu et al. . 2000l ) and 
simulated annealing to IMTM. 

In the next section we discuss the IMTM algorithm, propose a number 
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of alternative implementations and prove their ergodicity. In Section 3 we 
focus on some special cases of the IMTM algorithm and in Section 4 the 
performance of the methods proposed is demonstrated with simulations and 
real examples. We end the paper with a discussion of future directions for 
research. 

2 Interacting Monte Carlo Chains for MTM 

We begin by describing the MTM and its extension for using different pro- 
posal distributions. 

2.1 Multiple-Try Metropolis With Different Proposal Dis- 
tributions 

Suppose that of interest is sampling from a distribution tt that has support 
in y C R*^ and is known up to a normalizing constant. Assuming that the 
cu rrent stat e of th e chain is x, the update defined by the MTM algorithm 
of Liu et al. is described in Algorithm [TJ 



Algorithm 1. Multiple-try Metropolis Algorithm (MTM) 

1. Draw M trial proposals yi, . . . ,yM from the proposal distribu- 
tion T{-\x). Compute w{yj,x) for each j £ {!,■•. ,M}, where 
w{y,x) = TT{y)T{x\y)X{y,x), and X{y,x) is a symmetric func- 
tion of X, y. 

2. Select y among the M proposals with probability proportional to 
w{yj,x),j = 1,...,M. 

3. Draw x\, . . . ■,x\,i_^ variates from the distribution T{-\y) and let 
x\j = X. 

4- Accept y with generalized acceptance probability 

w{yi,x) + . . . + w{yM,x) 
w{xl,y) + ...-^w{xlj,y) 



p = min < 1 



Note that while the MTM uses the same distribution to generate all 
the proposals, it is possible to extend this formulation to different proposal 
distributions without altering the ergodicity of the associated Markov chain. 
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Let Tj{x, •), with j = 1, . . . ,M, be a set of proposal distributions for 
which Tj{x, y) > if and only if Tj{y, x) > 0. Define 

Wj{x,y) = TT{x)Tj{x,y)Xj{x,y) j = 1,...,M 

where Xj{x,y) is a nonnegative symmetric function in x and y that can be 
chosen by the user. The only requirement is that Xj{x,y) > whenever 
T{x, y) > 0. Then the MTM algorithm with different proposal distributions 
is given in Algorithm [2j 



Algorithm 2. MTM with Different Proposal Distributions 



1. Draw independently M proposals yi,---,yM such that yj ~ 
Tj{x, •). Compute Wj{yj,x) for j = 1, . . . , M. 

2. Select Y = y among the trial set {yi, . . . , y^v/} with probability 
proportional to Wj{yj,x), j = 1, . . . , M. Let J be the index of the 
selected proposal. Then draw x* ~ Tj{y, ■), j ^ J, j = 1, . . . ,M 
and let Xj = x. 

3. Accept y with probability 

. f wi(yi,x)H VwM{yM-,x)\ 

p = mm < 1, — — — r — > 

I wi{xl,y)^ hwM{xM,y)} 

and reject with probability 1 — p- 

It should be noted that Algorithm [2] is a special case of the interacting 
MTM presented in the next section and that the proof of ergodicity for 
the associated chain follows closely the proof given in Appendix A for the 
interacting MTM and is not given here. 

In Section 4 we will show, through simulation experiments, that this 
algorithm is more efficient then a MTM algorithm with a single proposal 
distribution. 

2.2 General Construction 

Undoubtedly, Algorithm [2] offers additional flexibility in organizing the MTM 
sampler. This section introduces generic methods for using a population of 
MCMC chains to define the proposal distributions. 
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Algorithm 3. Interacting Multiple Try Algorithm (IMTM) 



• For i = 1, . . . ,N 

(i) 

1. Let X = xii , for j = l,...,Mj draw yj ~ 
Tj^\-\xn '^ ^KxjXn'^^''^^) independently and compute 

wf\y„x) = T,{y,)Tf{y^\x\^--'-^\x,x^+^--''^)\f{y,,x). 

2. Select J G {l,...,Mj} with probability proportional to 
wfiVj, x), j = 1, . . . , Mi and set y = yj. 

3. For j = l,...,Mj and j 7^ J draw Xj ~ 
Tj'^\:\xn'^ ^\y,Xn^^'^^), let x*j = x^n and compute 

wf{x*,y) = ^(x*)7;«(x*|x(,i^-i),2/,x(:+i^^))Af(x*,y). 

4. Set x^^j^-^ = y with probability 

. f w'~l\yi,x) + ... + w'~^i^{yM,,x)\ 

Pi = mm < 1, — pj > 

[ w'{\xl,y) + ... + w^^^{xlj^,y)] 

and x^*l^ = Xn^ with probability 1 — pi. 



Consider a population of N chains, X^^^ = {xn }nGN and i = 1, . . . , N . 
For full generality we assume that the ith. chain has MTM transition kernel 
with Mi different proposals {Tj^^}i<j<M^ (if we set Mj = 1 we imply that 
the chain has a MH transition kernel). The interacting mechanism allows 
each proposal distribution to possibly depend on the values of the chains at 
the previous step. Formally, if H„ = {xn^}^^ is the vector of values taken 
at iteration n € N by the population of chains, then we allow each proposal 
distribution used in updating the population at iteration n + 1 to depend on 
H„. The mathematical formalization is used in the description of Algorithm 
[3l One expects that the chains in the population are spread throughout the 
sample space and thus the proposals generated are a good representation of 
the sample space y ultimately resulting in better mixing for the chain of 
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interest. 

In order to give a representation of the IMTM transition density let us 
introduce the fohowing notation. Let T^*^ {x, y) = tJ*^ {y\xn''^ , x, Xn~^^'^^ ) , 

r(^)(x,yi:Mj = UktiTjf\x,yk) and T^]{x,yi.,M,) = Uk^^jT^Hx^yk) and 

define dyi-M, = Ilt=i '^Vk and dy-j = Y[k% dVk- 

The transition density associated to the population of chains is then 

N 

(i) J') 



i=l 

where 

Mi f \ 

K,ix,y) = Y,^i^^y)Tf{x,y) + 1 -Y^Bfix) 5,{y) (2) 

is the transition kernel associated to the z-th chain of algorithm with 
and 



A)'{x,y) = / w)\y,x)p)\x,y)T^{y,xlj^j)T^ {x,yi.Mjdx*_jdy^j 



Pi {x, y)T^': {y, dxlM.)T^'\x, dyi.,M,)dx*_,dyi;M, 



In the above equations w^j\yj,x) = w^j\yj,x)/{w^j\y,x) + iv^^\{yi;Mi\x)), 

with j = 1, . . . , Mi and w^^^^iyi-.Milx) = J2h^j w^k\yk,x), are the normalized 
weights used in the selection step of the IMTM algorithm and 

(i), , . \, wf{y,x) + iv'^^]{yi.,M,\x)\ 

Pj [x, y) = mm < 1, > 

[ wf{x,y)+w^y.{x\.j^.j^\y)] 

is the generalized MH ratio associated to a MTM algorithm. 

The validity of Algorithm [3] relies upon the detailed balance condition. 

Theorem 1. The transition density i^j(xn\ a^i^+i) associated to the i-th 
chain of the IMTM algorithm satisfies the conditional detailed balanced con- 
dition. 
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Proof See Appendix A. 

Since the transition Ki{x^n ,x']^j^'^, i = 1, . . . , N has Tr{x) as stationary 
distribution and satisfies the conditional detailed balance condition then the 
joint transition Er(H„,H„+i) has 7r(x)^ as a stationary distribution. 

An important issue directly connected to the practical implementation of 
the IMTM is the choice of proposal distributions and the choice of Xj'\x, y). 
First it should be noted that at each iteration of the interacting chains 
the computational complexity of the algorithm is 0{N'^f^^Mi). When 
considering the number of chains and the number of proposals, there are 
two possible strategies in designing the interaction mechanism. 

The first strategy is to use a small number of chains, say 2 < < 5, 
in order to improve the mixing of each chain and to allow for large jumps 
between different regions of the state space. When applying this strategy 
to our IMTM algorithms it is possible to set the number of proposals to be 
equal to the number of chains, i.e. Mj = A^, for all 1 < i < A". In this 
way all the chains can interact at each iteration of the algorithm and many 
search directions can be included among the proposals. 

A second strategy is to use a higher number of chains, e.g. A^ = 100, in 
order to possibly have, at each iteration, a good approximation of the target 
or a much higher number of search directions for a good exploration of 
the sample space. This algorithm design strategy is common in Population 
Monte Carlo or Interacting MCMC methods. Clearly when a high number 
of chains is used within IMTM, it is necessary to set Mj < A^. In the next 
section we discuss a few strategies to built the Mj proposals. 



2.3 Parsing the Population of Auxiliary Chains 

One of the strategies that revealed to be successful in our applications con- 
sists in the random selection of a certain number of chains of the popula- 
tion in order to build the proposals. More specifically, we let Mj = M, 
for all z, and when updating the i-th chain of the population we sample 
Ii,...,Im random indexes from the uniform distribution W{i,...,Ar}, with 

A^ > M, and then set the proposals: Tj^\y,x) = Tj^\-\x, Xn^\ . . . ,Xn^'^), 
for all j = 1, . . . , M. On the basis of our simulation experiments we found 
that the following choice Tj^\- ■ ■ \xn^\ . . . , Xn^''^) = Tj{-\xn^^) is works well 
in improving the mixing of the chains. 

Previously suggested forms for the function X) {x, y) (jLiu et all , lioool ) 

are: 
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a) \f{x,y) = l 

b) \f{x,y) = 2{Tf\x,y)+Tf\y,x)}-^ 

c) Xf{x,y) = {Tf (x,y)rf a > 0. 

Here we propose to include in the choice of A the information provided by the 
population of chains. Therefore we suggest to modify the above functions 
as follows 

a') \f{x,y) = Uj 

h') \fix,y) = 2v, {Tf{x,y)+Tf{y,x)}-^ 

d) \f{x,y) = {Tf (x,y)Tf a > 

where the factor Vj captures the behaviour of the auxiliary chains at the 
previous iteration 

i=l 

(i) 

where J^-i is the random index of the selection step at the iteration n — 1 for 
the i-th chain. The modifications proposed for A(-, •) would increase the use 
of those proposal distributions favoured by the population of chains at pre- 
vious iteration. Since z^j depends only on samples generated at the previous 
step by the population of chains, the ergodicity of the IMTM chain is pre- 
served. An alternative strategy is to sample the random indexes Ii, . . . , Im 
with probabilities proportional to I'j. 



2.4 Annealed IMTM 

Our belief in IMTM's improved performance is underpinned by the assump- 
tion that the population of Monte Carlo chains is spread throughout the 
sample space. This can be partly achieved by initializing the ch ains using 



draws from a distributioii over d ispersed with respect to vr (see also lJennison 



19931 : iGelman and Rubinl . \l99i ) and partly by modifying the stationary diS' 



tribution for some of the chains in the population. Specifically, we consider 
the sequence of annealed distributions irt = vr* with t € {'^i, ■^2) • • • , ^Af}, 
where 1 = > ^2 > • • • > '^n, for instance = 1/^- When t,s are 
close temperatures, iit is similar to tTs, but vr = vri may be much harder 
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to sample from than vr^^^ as has been long reco gnized in the simulated an- 
nealing and simulated tempering literature (see iMarinari and Parisil . Il992l : 
Gever and Thompsonl . Il994l : iNeal 1 19941 ') . Therefore, it is likely that some of 



the chains designed to sample from tti, . . . , tttv have good mixing properties, 
making them good candidates for the population of MCMC samplers needed 
for the IMTM. 

We thus consider the Monte Carlo population made of the — 1 chains 
having {712, . . . , ttat} as stationary distributions. An example of annealed 
interacting MTM is given in Algorithm [H Note that we let the i-th chain to 
interact only with the — i + 1 chains at higher temperature by sampling 

Ii,...,Im from ZY{i^...^7v-i+i}- 

An astute reader may have noticed that the use of MTM for each auxil- 
iary chain may be redundant since for smaller ^j's the distribution vTj is easy 
to sample from. In Algorithm [5] we present an alternative implementation 
of the annealed IMTM in which each auxiliary chain is MH with target vTj, 
l<i< N -1. 



Algorithm 4. Annealed IMTM Algorithm (AIMTMl) 
• For i = l,...,N 

(i) 

1. Let X = Xn and sample Ii, . . . , Im from ^{i,...,7v-i+i}- 

2. For j = 1, . . . ,M draw yj ~ Tj^\-\xn^^) independently and 
compute 

wf{yj,x) = 7r{yj)T^'\yj\xl!'^)Xf {yj,x). 

3. Select J € {!,..., M} with probability proportional to 
wf{yj,x), j = 1,...,M and set y = yj. 

4. For j = 1,...,M and j 7^ J draw Xj ~ 
Tj^\-\xn'^ ^\y,Xn^^'^^), let Xj = Xn^ and compute 

wf\x*,y) = 7rix*)Tj'^ix*\y)X^(x*,y). 

5. Set xjj^]^ = y with probability pi, where pi is the generalized 

M.H. ratio of the IMT algorithm and x^^^^-^ = Xn^ with 
probability 1 — pi. 
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Algorithm 5. Annealed IMTM Algorithm (AIMTM2) 



• For 1 = 1 

(i) 

1. Let X = Xn and sample Ii, . . . , Im from ^/{i,...,Ar}- 

2. For j = 1, . . . ,M draw yj ^ Tj^\-\xn^^) independently and 
compute 

wf^j,x) = 7r{y,)Tj'\y,\xi^^^)xf\y„x). 

3. Select J G {!,..., M} with probability proportional to 
wf^ {Vj^x), j = 1,...,M and set y = yj. 

4- For j = 1,...,M and j ^ J draw Xj ~ 
Tj^\-\xn ^ ^\y,Xn^^'^^), let x*j = Xn^ and Compute 

wfix*,y) = 7r{x])Tj'>{x;\y)xf{x*,y). 
5. Set a;^*+i = y with probability pi, where pi is the generalized 

(i) (i) 

M.H. ratio of the IMT algorithm and x\^-^ = Xn with 
probability 1 — pi. 

• Fori = 2,...,N 

1. Let and update the proposal function T^'^\-\x). 

2. Draw y T^'\-\ x) and compute 



mm 



n{y)^^T('\x\y) 
' 7r(x)^irW(y|a;) 



(i) (i) (i) 

3. Set x^_^_i — y with probability pi and x^_^_^ — Xn with prob- 
ability 1 — Pi- 



The chain of interest, corresponding to ^ = 1, has an MTM transition 
kernel with M = — 1 proposal distributions. At time n the j'th proposal 
distribution used for the chain ergodic to vr is Tf> = T^-?), is the same as 
the proposal used by the jth auxiliary chain, for all 2 < j < N. 
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An additional gain could be obtained if the auxiliary ch ains' transition 



kerne ls are modified using adaptive MCMC strategies (see also lChauveau and Vandekerkhove 



2002I . for another example of adaption for interacting chains). However, it 
should be noted that letting the auxiliary chains adapt indefinitely results in 
complex theoretical justifications for the IMTM which go beyond the scope 
of this paper and will be presented elsewhere. Our current recommendation 
is to use finite adaptation for the auxiliary chains prior to the start of the 
IMTM. One could take advantage of multi-processor computing units and 
use parallel programming to increase the computational efficiency of this 
approach. 

The adaptation of A^*^ (■,•)) through the weights Vj defined in the previous 
section, should be used cautiously in this case. The aim of the annealing 
procedure is to allow the higher temperatures chains to explore widely the 
sample space and to improve the mixing of the MTM chain. Using Vj the 
context of annealed IMTM could arbitrarily penalize some of the higher 
temperature proposals and reduce the effectiveness of the annealing strategy. 

It is possible to have a Monte Carlo approximation of a quantity of 
interest by using the output produced by all the chains in the population. 
For example let 

X = / h{x)'K[x)dx 

Jy 

be the quantity of interest where /i is a test function. It is possible to 
approximate this quantity as follows 



T , N 



'-NT 

I '■ — ' ( 



n=l ^ 



where , with n = 1, . . . , T and i = 1, . . . ,N is the output of a IMTM 
chains with target vr^' and ("j (x) = tt{x) /vr^^ (x) is a set of importance weights 
with normalizing constant ^ = X^j^i Cjixn^)- 

2.5 Gibbs within IMTM update 

It should be noticed that in the proposed algorithm at the n-th iteration the 
chains are updated simultaneously. In the interacting MCMC literature 
a sequential updating scheme (Gibbs-like upda ting) has been p r opose d for 



fr opose u 
(20Q^)- In 



the following we show that the Gibbs-like updating also apply to our IMTM 
context. In the Gibbs-like interacting MTM (GIMTM) algorithm given in 
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Algorithm [6] the different proposals functions of the i-th chain, with i = 
1,...,A^, may depend on the current values of the updated chains x^j^^, 

with J = 1, . . . , (i — 1) and on the last values Xn \ with j = (i + 1), . . . , N , 
of the chains which have not yet been updated. 

Algorithm 6. Gibbs-like IMTM Algorithm (GIMTM) 

• For i = 1, . . . , N 

1. For j = 1, . . . , Mi draw yj ~ T^'^\-\x^^^^^\ x^\ Xn~^^'^^) in- 
dependently and compute 

wf^,,x)=n(^,)Tj%,\x^:;'\x^,xt'''^^)xf{y,,x^). 

2. Select J € {!,..., Mj} with probability proportional to 
wf\yj,x), j = l,...,Mi and set y = yj. 

3. For j = l,...,Mj and j ^ J draw x* ~ 
^j*^('l^ra+i ^\yjXn~^^'^^), let Xj = Xn^ and compute 

wf(x*,y) = ^(xpT;^)(x*[x(^r'\?/,4-^'^^^)Af 

4. Set x^^]^ — y with probability 

. w'i\yi,x) + ... + w'-ll{yM,,x)\ 
Pi = mm < 1, 



w 



■J?{xl,y) + ... + w^^^{xl^^,y) 
and x'^'^-^ = Xn' with probability 1 — pi. 



In the GIMTM algorithm the iteration mechanism between the chains is 
not the same as in the IMTM algorithm. The chains are no longer indepen- 
dent since the proposals may depend on the current values for some of the 
chains in the population. The transition kernel for the whole population is 

N 
1=1 

and in this case the validity of the algorithm still relies upon the conditional 
detail balance condition given for the IMTM algorithm. 
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Finally we remark that the GIMTM algorithm allows us to introduce 
further po ssible choices for the Xj'^(x,y) functions. In particular a repulsive 
factor (see Mengersen and RobertI (|2003l )) can be introduced in the selection 
weights in order to induce negative dependence between the chains. We let 
the study of the GIMTM algorithm and the use of repulsive factors for future 
research and focus instead on the properties of the IMTM algorithm. 



3 Some generalizations 

In the following we will discuss some possible generalization of the IMTM 
algorithm. First we show how to use the stochastic overrelaxation method 
to possibly have a further gain in the efficiency. Secondly we suggest two 
possible strategies to built the different proposal functions of the IMTM. 
The first strategy consists in proposing values along different search direc- 
tions and re presents an extension of the random-ray Monte Carlo algorithm 
presented in iLiu et all (|200d 'l. The second strategy relies upon a suitable 



combination of target tempering and adaptive MCMC chains. 
3.1 Stochastic Overrelaxation 

Stochastic overrelaxation (SOR) is a Markov chain Monte Carlo technique 
developed by Adler (1981) for normal densities and subsequently extended 
by Green and Han (1992) for non-normal targets. The idea behind this 
approach is to induce negative correlation between consecutive draws of a 
single MCMC process. 

Within the MTM algorithm we can implement SOR by inducing negative 
correlation between the proposals and between the proposals and the current 
state of the chain, x. A natural and easy to implement procedure may 
be based on the assumption that {yi, . . . ,yM-i,x)'^ ~ NdxM{0,V) where 
y's structure is dictated by the desired negative dependence between the 
proposals yi, . . . , y^s and x, specifically 



V 



Si 

^12 



^12 
S2 



IM 



^2M 



For instance, 



we can set 



^2M ■ ■ ■ 

= whenever i,j ^ M and 



1/2 1/2 

RiM^M where Rim is a c orrelation matrix which corresponds to ex- 



treme negative correlation (see ICraiu and Mengl . l2005l . for a discussion of 
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extreme dependence) between any two components (with same index) of yi 
and X, for any 1 < i < M — 1. The general c onstruction falls within the 



context of dependent proposals as discussed by ICraiu and Lemieuxl (j2007l ) 
with the additional bonus of "pulling" the proposals away from the cur- 
rent state due to the imposed negative correlation. This essentially ensures 
that no proposals are exceedingly close to the current location of the chain. 
Also note that the construction is general enough and can be applied for 
Algorithms 2 and 3 as long as the proposal distributions are Gaussian. 

3.2 Multiple Random-ray Monte Carlo 

We show here that the use of different proposals for the MTM algorithm 



allows also to extend the random-ray Monte Carlo method given in lLiu et al. 



In particular the proposed algorithm allows to deal with multiple 



search directions at each iteration of the chains. At the n-th iteration of the 
chain, in order to update the set of chains H^, the algorithm performs for 

(r) ^ 

each chain Xn G H^, with r = 1, . . . ,N, the following steps: 

1. Evaluate the gradient log7r(x) at Xn^ and find the mode along 
Xn^ + rUn where n„ = Xn^ - x^^^l^. 

2. Sample /i, . . . , Im from the uniform ^/{i,...,r-i,r+i,...,Af}- 

3. Let Cnj = {a-n — Xn^^)/\\an — Xn^\ \ and sample rj from A/'(0,(T^). 

and then use the set of proposals Tj which depends on Cn,7' to perform a 
MTM transition with different proposals as in the IMTM algorithm (see 
Alg. I}. 



4 Simulation Results 
4.1 Single-chain results 

In this section we carry out, through some examples, a comparison between 
the single-chain multiple try algorithms MTM-DP with different proposals 
and the algorithm in Liu et al. (2000). In the MTM-DP algorithm we 
consider four Gaussian random- walk proposals ~ A/'n(x, Aj) with Ai = 
O.lldn, A2 = 5/(i„, A3 = 501 dn and A4 = 100/(i„, where Idn denotes the 
n-order identity matrix. In the MTM selection weights we set Aj(x,y) = 
2aj/(T,(x,y) + T,(y,x)), where aj = 0.25, for j = 1, . . . ,4. 
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In order to compare the MTM algorithm with different proposals and 
the Multiple Try algorithm of Liu et al. (2000) we consider 20,000 iterations 
and use four trials yj generated by the following proposal 

4 

T,-(x,y,-) = ^ajAr„(x,Aj) (3) 
i=i 

where Aj, j = 1,...,4, have been defined above. In the weighs of the 
selection step we set A(x,y) = 2/{Tj{x,y) +Tj(y,x)). 



4.1.1 Bivariate Mixture with two components 

We consider the following bivariate mixture of two normals 

^A/-2(/li,Si) + ^Ar2(/l2,S2) (4) 

with fj,^ = (0, 0)', /X2 = (10, 10)', Si = diag(0.1, 0.5) and S2 = diag(0.5, 0.1). 

In Fig. [1] the ACF with 30 lags for the first component of the bivari- 
ate MH chain. The autocorrelation is lower for the MTM algorithm with 
different proposals. 



4.1.2 Multivariate Normal Mixture 

We compare the algorithms for high-dimensional targets. We consider the 
following multivariate mixture of two normals with a sparse variance-covariance 
structure 

^AA2o(/2l,Si) + ^AA2o(/X2,S2) (5) 

with fii = (3, . . . , 3)', /X2 = (lOi • • • 5 10)' ^iifl ^j; ■^ith j = 1, 2, generated 
independently from a Wishart distribution Sj ~ W2o(i^) Id2o) where u is the 
degrees of freedom parameter of the Wishart. In the experiments we set 
1/ = 21. 

The autocorrelation function of the chain for one of the experiment is 
given in Fig. [2j The ACF has been evaluated for each components of the 
20-dimensional chain. The values of the ACF of the MTM-DP (black lines 
in Fig. [2]) are less than those of the original MTM (gray lines of the same 
figure) in all the directions of the support space. We conclude that the MTM 



algorithm with different proposals (Algorithm [3]) outperforms the lLiu et al. 
(jioOfl) MTM algorithm. 
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Figure 1 : Autocorrelation function of the Liu's MTM (gray line) and MTM 
with different proposals (black line). 
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Figure 2: Autocorrelation function (ACF) of the 20 components of the mul- 
tivariate MH chain for the Liu's MTM (gray lines) and MTM with different 
proposals (black lines). 
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4.2 Multiple-chains results 

In this section we show real and simulated data results of the general inter- 
acting multiple try algorithm in Alg. [TJ 

4.2.1 Bivariate Mixture with two components 

The target distribution is the following bivariate mixture of two normals 

^AA2(//l,Si) + ^AA2(/X2,S2) (6) 

with fj,^ = (0, 0)', /X2 = (10, 10)', Si = diag(0.1, 0.5) and S2 = diag(0.5, 0.1). 

We consider a population of = 50 chains with M = 50 proposals and 
1,000 iterations of the IMTM algorithm. For each chain we consider the 
case 7;«(y|xi^7^),x«,xr^^^)) = rO)(y, x^f^) and draw 

y,~A/2(xi^'U,) (7) 

where Aj = (0.1 + 5j)Id2- In this specification of the IMTM algorithm 
each chain has M independent proposals with conditional mean given by 
the previous values of the chains in the population. 

In this experiment we consider two kind of weights. First we set A^*^ (x, y) = 

{Tj^\x,y)Tj^\y,x))~^ , that corresponds to use importance sampling selec- 
tion weights, secondly we consider \^\x,y) = 2{Tj^\x,y) + Tj^\y , x))^^ , 
which implies a symmetric MTM algorithm. We denote with IMTM-IS and 
IMTM-TA respectively the resulting algorithms. 

Fig. H show the results of the IMTM-IS and IMTM-TA algorithms at 
the last iteration of the population of chains (black dots). In both of the 
algorithms the population is visiting the two modes of the distribution in 
the right proportion. Moreover each chain is able to jump from one mode 
to the other, (the light-gray line represents the sample path of one of the 
chain) . 

4.2.2 Beta-Binomial Model 

We consider here the problem of the genetic instability of esophageal can- 
cers. During a neoplastic progression the cancer cells undergo a number of 
genetic changes and possibly lose entire chromosome sections. The loss of 
a chromosome section containing one allele by abnormal cells is called Loss 
of Heterozygosity (LOH). The LOH can be detected using laboratory assays 
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Figure 3: Log-target level sets (solid lines), the MH chains (red dots) at the 
last iteration of the IMTM algorithms and the path of one of the chain of 
the set (gray line). Up: output of the IMTM-IS with importance sampling 
selection weights. Bottom: ouput of the IMTM-TA with symmetric lambda 
as in (c). 1^ 



on patients with two different alleles for a particular gene. Chromosome re- 
gions containing genes which regulate cell behavior, are hypothesized to have 
a high rates of LOH. Consequently the loss of these chromosome sections 
disables important cellular controls. 

Chromosome regions with high rates of LOH are hypothesized to contain 
Tumor Suppressor Genes (TSGs), whose deactivation contributes to the 
development of esophageal cancer. Moreover the neoplastic progression is 
thought to produce a high level of background LOH in all chromosome 
regions. 

In order to discriminate between "back ground" and TSGs LOH, the 
Seattle Barrett's Esophagus research project ( Barrett et al. (jl996l )) has col- 



lected LOH rates from esophageal cancers for 40 regions, each on a distinc t 
chromosome arm. The labeling of the two groups is unknown so besail hood ) 



suggest to consider a mixture model for the frequency of LOH in both the 
"background" and TSG groups. 

We c onsid er the hierarchical Beta-Binomial mixture model proposed in 
Warned (|200lh 



n\ r(l/a;2) r{x + iT2/uj2)'['{n - x + {1 - tt2)/u}2) 



/(x,n|r/,7ri,7r2,7) = r?( "J )<•(! - vrir"- + (8) 

~'^\xj r(7r2/LJ2)r((l - 7r2)M) r(n + I/C02) 

with X number of LOH sections, n the number of examined sections, 002 = 
exp{7}/(2(l + exp{7})). Let x = (xi, . . . ,Xm) and n = (ni, . . . ,nm) be a 
set of observations from /(x, n|7/, vri, 7r2, 7) and let us assume the following 
priors 

^~^o,i]i '^I'^^lo,!], 7r2~W[o,i] and 7 ~ ^-30,30] (9) 
with U the uniform distribution on [a, b]. Then the posterior distribution is 

m 

vr(r/,7ri,7r2,7|x,n) cx J| /(a;^, nj|r/, vri, 7r2, 7) (10) 
i=i 

The parametric space is of dimension four: (??, tti, 712, 7) G [0,1]^ x [—30,30] 
and the posterior distribution has two well-separated modes making it dif- 
ficult to sample using generic methods. 

We apply the IMTM algorithm with 100 iterations, M = 4 proposal 
functions randomly selected between a population of = 100 chains. For 
each chain we consider importance sampling weights in the selection step. 
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Figure 4: Values of the population of chains (dots) at the last iteration on 
the subspace (7ri,7r2). The interaction is given by M = 4 proposal functions 
randomly selected between the population of = 100 chains. 



i.e. we set X^^\x, y) 



with j = 1, . . . , 4 and 



1, . . . , 100. The values of the population of chains (dots) at the last iteration 
on the subspace (vri,7r2) is given in Fig. I4.2.2[ The IMTM is able to visit 
both regio ns of the parameter space confirming the analysis of ICraiu et al 
(|2009l ) and lWarned \200l\ ). 



4.2.3 Stochastic Volatility 



The estimation of the stochastic volatility (SV) m odel due tolTavloij (119941 ') 
still repres ents a challenging issue in both off-hne (jCeleux et al.l ^2m(h ) and 
sequential ( Casarin and Marin ( 20091 )) inference contexts. One of the main 
difficulties is due to the high dimension of the sampling space which hinders 
the use of the data- augmentation and prevents a reliable jo int estimation 



of the parameters and the latent variables. As highlighted in lCasarin et al 



(|2009l ) using multiple chains with a chain interaction mechanism could lead 
to a substantial improvement in the MCMC method for this kind of model. 
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We consider the SV model given in ICeleux et all (|2006l ) 



yt\ht 
ht\ht-i,0 

ho\e 



AA(O,aV(l-0')) 



with t = 1, . . . ,r and = (a, 4), cr^)- For t he parameters we assume the 
noninformative prior (see ICeleux et all . l2006l ) 

7r(0)ocl/M)I(_i,i)(0) 

where /^^ = exp(a). In order to simulate from the posterior we consider 
the full conditional distributions and apply a Gibbs algorithm. If we define 
y = (yi, . . . , ut) and h = (/iq, • • • , /it) then the full conditionals for /3 and (j) 
are the inverse gamma distributions 

/3'|h,y - X^|^^y^xp(-M/2,(r-l)/2^ 

a2|0,h,y - Xg(^{ht-^ht^if/2 + hl{l-cf'),{T -l)/2^ 
and (j) and the latent variables have non-standard full conditionals 



Ti{(t)\a ,h,y) oc (1 



exp 



1 



i=2 



t=2 



7r(/it|Q;>,(72,h,y) oc exp ( ((/if - a - (t)ht-if- 



{ht+i - a- (t)htf) -]^{ht + yt exj){-ht))^ . 



In order to sample from the posterior we use an IMTM within Gibbs algo- 
rithm. A d etailed descrip t ion o f the proposal distributions for (p and ht can 
be found in ICeleux et all (|2006l l. 

We consider the two parameter settings («,(/>, a^) = (0,0.99,0.01) and 
(a, cT^) = (0,0.9,0.1) which correspond, in a financial stock market con- 
te xt, to daily and weekly fre quency data respectively. Note that as reported 



m 



Casarin and Marin ( 20091 ) inference in the daily example is more difficult. 



We compare the performance of MH within Gibbs and IMTM within Gibbs 
algorithms in terms of Mean Square Error (MSE) for the parameters and 
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of cumulative RMSE for the latent variables. We carry out the compari- 
son in statistical terms and estimate the MSE and RMSE by running the 
algorithms on 20 independent simulated datasets of 200 observations. In 
the comparison we take into account the computational cost and for the 
IMTM within Gibbs we use = 20 interacting chains, 1,000 iterations and 
M = 5 proposal functions. This setting corresponds to 100,000 random 
draws and is equivalent to the 100,000 iterations of the MH within Gibbs 
algorithm used in the comparison. Note that the proposal step of the IMTM 
selects at random the proposal functions between the other chains and the 
selection step uses X^j\x,y) = (Tj^\x , y)Tj^\y , x))~^ , with h = 1, . . . ,5 and 
i = l,...,20. 

The results for the parameter estimation when applying IMTM are pre- 
sented in Table [T] and show an effective improvement in the estimates, both 
for weekly and daily data, when compared to the results of a MH algorithm 
with an equivalent computational load. 





Daily Data 




Weekly Data 


9 


Value 


MSE 
IMTM MH 


9 


Value 


MSE 
IMTM MH 


a 





0.04698 
(0.00612) 


0.09517 
(0.00194) 


a 





0.00146 
(0.00139) 


0.00849 
(0.00105) 


<P 


0.99 


0.20109 
(0.02414) 


0.34825 
(0.05187) 


<P 


0.9 


0.01328 
(0.04014) 


0.10746 
(0.03629) 


a' 


0.01 


0.00718 
(0.00173) 


0.02380 
(0.00202) 


a' 


0.1 


0.00136 
(0.00141) 


0.09175 
(0.00358) 



Table 1: Mean square error (MSE) and its standard deviation (in paren- 
thesis) for the parameter estimation with IMTM and MH within Gibbs 
algorithms. Left panel: daily datasets. Right panel: weekly dataset. 

A typical output of the IMTM for some chains of the population and 
for the latent variables h is given in Fig. 14.2.31 Each chart shows for a 
given chain the estimated latent variables (dotted black line), the posterior 
quantiles (gray lines) and the true value of h (solid black line) . 

Figures 14.2.31 and 14.2.31 exhibit the HPD region at the 90% (gray areas) 
and the mean (black lines) of the cumulative RMSE of each algorithm for 
the weekly and daily data, respectively. The statistics have been estimated 
from 20 independent experiments. The average RMSE shows that, in both 
parameter settings considered here, the IMTM (dashed black line) is more 
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efficient than the standard MH algorithm (sohd black line). 

5 Conclusions 

In this paper we propose a new class of interacting Metropolis algorithm, 
with Multiple Try transition. These algorithms extend the existing literature 
in two directions. First we show a natural and not straightforward way to 
include the chains interaction in a multiple try transition. Secondly the 
multiple try transition has been extended in order to account for the use 
of different proposal functions. We give a proof of validity of the algorithm 
and show on real and simulated examples the effective improvement in the 
mixing property and exploration ability of the resulting interacting chains. 

Appendix A 

Proof 

Without loss of generality, we drop the chain index i and the iteration index 

n, setMi = N, \/i andx^ = x and denote withTj{y,x) = Tj{y\x^^i ^\xn\xn'^^''^^) 

\^-\yj,x^) the j proposal of the i-th chain at the iteration n conditional on 

the past and current values, x^^l^^ and Xn^^'^^ respectively, of the other 
chains. 

Let us define the following quantities 

N N 
w{yi:N\x) = ^Wj{yj,x), w_k{yi:N\x) = ^Wj{yj,x) 
j=l j^k 

and 

1 ^ 

with J G t7 = {1, . . . , iV} the empirical measure generated by different pro- 
posals and by the normalized selection weights. 

Let T{x,dyi:N) = (^f=iTj{x,dyj) : {y x B{y^)) ^ [0, 1] the joint pro- 
posal for the multiple try and define T_fe(.x, (iyi-jy) = ^f-^k Tj{x,dyj). Let 
A{x, y) be the actual transition probability for moving from x to y in the 
IMT2 algorithm. Suppose that x ^ y, then the transition is a results two 
steps. The first step is a selection step which can be written as y = yj 
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Figure 5: Typical output of the population of chains for a weekly dataset. 
For each chain the estimated latent variable (dotted black line), the 2.5% 
and 97.5% quantiles (gray lines) and the true value of h (solid black line). 
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Figure 6: Cumulative RMSE for the IMTM (dashed hue) and MH (sohd 

Une) and the 90% HPD regions for the IMTM (hght gray) and MH (dark 
gray) estimated on 20 independent experiments for both the daily (up) and 
weekly (bottom) datasets. 
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and x*j = X with the random index J sampled from the empirical measure 

S]\f{ J). The second step is a accept/reject step based on the generalized MH 
ratio which involves the generation of the auxiliary values Xj for j ^ J. 
Then 

Tr{x) A{x,y) = 



x5x{dxj)5yj{dy) miul 1, 



'w{x\.N\y) 

= 7r(x)V/ T_j{x,dyi:N)Tj{x,y) T_j{y,dx\.j^) 



Wj{y,x) min[i Wj{y,x) + W-j{yi.,N\x) 

Wjiy,x) + w-j{yi:N\x) \ ' Wj{x,y) + iij-j{xl.j^\y) 

V" Wj{x,y)wj{y,x) f t rl„ \ ^ 

> 7—, X / l-j{X,dyi:N) X 

p[ ><j[y,x) Jy^iN-1) 



xT_j{y,dxl.j^) min 



1 



Wjiv^x) +w-j{yi;N\xy Wj{x,y) +w-j{xl.^\y) 
which is symmetric in x and y. 
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